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*4* ? 4^ » .The several s'tatristical methods described for 
detecting test ^bi^s' in -terms of* varibus^ internal /features of a 



*s^:ill to^>^ketchy to permit any stron?it^<^ncl,usi6ns, VTh^ evidence 
regardiiig bl?tck*white cojo.p'afison9^ feoW^ver^ is based dn. a numberrOf 
¥.ell-lcriown\ widely ii'se*^ quite diverse standardized* individual 

and group 'tests of intelligence given to a la'rge .representative 
sample of vhites and blacks. The results ^ane ufieguivpcal,: none of the 
several subjective indices of cultural bias s'hows^^any si^nificaint 
indica^,ion of bias in any of \these tests when they ^re \uled with 
black? and 'whites. C0rrela.t3.pn of raw scores with age internal ' - 
consistency reliafbility^ rank order of item • difficult y^ relative • 
(^fficulty of a.djacent items, 'item, correlation with total ^Qore,^ 
load^ings o*f iten^s or' tests on the general facto,r^ and relative 
frequencies' in choice of error distractors — all' are substantially the 
.same^in black and .vhite groups* It is concluded* th^t these , 
standardized tests of intell?gence--*t he Peabody Picture Vocabulary, 
Raveri»5^'Progressi've tfatriqes^ Weohsler Intelligence. Senile for \ 
Children, Stanf ord-Binet , TJonderlic Personnel Test, and most likely 
othOfcT similar tests— -are not at all culturally biased fpr blacks and 
whites^, They behave statistically the same in both racial groups and 
do essentially the same job in both groups* (-Author/DEP) A 
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Test .Bias and Construct Val Ldity ' 
- ^ * ^ Arthur R. Jense.n * • ^ 

University%^ Cjalifornia, Berkeley 

fiost psychologists are surely' f ami 1 iar with*Lhe claims of critics that 
our mental tes ts are cul tural 1 y b iased agains t cert a in minor i ties, especially 
blacks, and are culturally biased in Javor of middle class yhi^tes. As a 
remin^der, here are just a few direct quotations I have picked up from the 

I * . . » 

literature.' ^hey are all very typical- 

"IQ tests are Anglocentr i c ; they measure the extent to which an individual's 
background is similar to t\]at of the modal cultural configuration ^pf 
American society." ' V * ^ ' * • • * 

r ■ . •• 

"'IQ measures everyone by'an Anglo )^ardstick. There' is a conspiracy to 
malce a narrow, biased collection of i tems * the real mea'sure of all persons." 

"Persons from backgrounds other than the culture ii^ which the t^st was 
developed will always be penalized." 

"Intelligence test^ are Sadly misnamed because they were never intended 
• ^ * ♦ 

.Jto measure intel 1 i^ence* and might have been more aptly called CB (cultural 

background^ tests." * 

■ ^ ' . * ' ' ' • 

"IQ tests yield the be^t results when taken by those who come from the 

same cul tural' background as the devisers of the tests.'' 

"Tests ^re clearly discrTffunatory against those who have not been^cxposed 
to the culture, entrance to«which is guar(i^ed by the tests." 
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Test Bias : 2 

'IRacial, ethnic, and social class differences in mean IQ scores .may not 
be due to genes or environment, but are probably inherent in the 
vpsycholinguis tic, cultural, and teinpora^ biases of the test." 

J 

I * 

"There are enormous social class differences in a child's access ta the 

♦ 

experiences . necessary to acquire the valued intellectual skills.*' 

"/^ptitufde tests reward white and middle class values and skills, 
e'specially ability to speak Standard English, 'and thus penalize 
minority chiijiren because of their backgrounds .5' 

"The middle-class environment i s ,the -birthright for IQ tfest-taking 
abiTityJ'' * 'J ^ 

"The IQ test is a seriously biased instrument that almost guarantees 
that middle-class. whi te children will obtain higher scores than any 
other group of ^children. The more similar the experiences of two 
people, the more similar their scores should be." 

"IQ scores reported for blacks and^low socioeconomic groups in the ^- 
U.S. reflect characteristics of the test rather than of the test takers. 

"Culturally unfair tests may be valid p;:edictOFs of culturally unfair 

but>' nevertheless highly important criteria. Educational attainment, 

to the. degree that it reflects social inequities rather than intrinsic 

,/ 

merit, might be considered culturally unfair*" ^ 
% 

"The poor performance of Negro children on conventional tests is due to 
the biased content of the tests, that is, the test material is drawn 
from outside the black culture*" 



^ Test Bias 3 

"The words included in vocabulary tests are b*ased on the frequency of 
theiT usage by whites. Blacks, who have differing vocabularies, may 
do poorly." 

.Notice the mainr themes In'these criticisms of menfa^ tests; 

1. The tests draw heavily upon specific middle-class cultural knowledge and 

9 

linguistic usage. * 

2. The impli^cation Is that blacks or other minopdties iit ttve^U.S do not share 
a common culture or background of verbal and cognitive experience which is 
sampled -b^v the tests. 

3. Similarity in test performance is a direct function of similarity in 
cultural background. ^ 

4. The big-gest differences in IQ scores are between lower^and middle social 
classes and majority and minority, racial groups. 

3* Culturally biased' tests may nevertheless show good predictive validity for 
predicting culturally biased criteria, lik-e, educational attainment and 
success in certain occupations* ^ 

Where Do IQ Tests Show Differences ? , 

First of all, let's gain a bit of perspective as to just where tests show 
differences and how big those* differences are relative to . one another. I have 
been able to do this with a number of different Intelligence tests, using very 
large samples of school ghildrep in California/ 1*11 use the Wechsler Intel- 
ligence Scale £or Children-Revised (WISC-R), as an example, "with data on Full' 
« 

Scale IQs of more than 600 whites and 600 blacks representing a random sample 
of Gal i fornia ' school children, ages S to 12.-^^ 

/ 



V 



Test Bias . 4 ^ 

'Table 1 shows an, analysis of variance, with^the percentage of total 
variance attributable to each of the*sources. The i>f igures easiest to grasp 



Insert Table 1 about here 



are those' in the last -coluinn, giving the average' absolute difference in IQ, 

We had a 10-point scale of socioeconomic clas^ on these children. The average 

' IQ differences between* all possiblffv. c'omparisons of the 10 social cl^asses 

(within each racial group) was only 6 IQ' points, (The largest SES di-fference 

was 26 IQ points in the whites and 12 IQ points in the blacks.) 

The ayerage race' difference, independently of. socioeconomic status (as 

measured by Duncan^s SES index) is 12 IQ points. Bur here. is the important 

t * 

point: the average difference between full siblings •within the same family 

6 

is also 12 IQ poitfts. If the Wechsler IQ test is so culturally biased, as 
come critics claim, what kind of bias is it that pr®duces as large a differ-' 
ence between siblings as between blacks arid whites? Or a larger difference 
than the average difference between social classes? Notice, too, that the- 
average IQ difference between families within the same social class (on a 
10-point scale of SES) is 9 points, which is 33% greater than the average 
^difference between social' classes* 

In short, the. notion fhat IQ tests discriminate the most betwe^en ta^es 
or social classes is just a myth* The IQ shows as much or more difference 
among children in the same family, sharing the same parents and culture and 
linguistic background, as between racial or social class groups. The generali- 
zation is /just not true that the more alike. is the background of 'two individuals 



,,Table 1 



Estimated Percent pf'Variancc and Av6rage Absolute Difference 



in WISC-R IQ Independently Associated with Race (White-Black), 
SoQiai CI ass, and Bctwe'cTi ar^d Within Families • 



Source 



% Variance 



Average IQ 
Di f f erenc^i 



Sbci-al Class CWi thin *'RacQs ) ' 8" 

Race (Within Social Classes) ^ 14- 

Between Fa;Tiilies (Within Race- and Social Class') * 29'' 

Within, Families (Siblings) * • 44 

Measurement Etiut — . 5' 

.Total Sample ^ ^ " ' " 100 



22 



73' 



' 6 
12 
9 
•12 
' 4 
17 



Sample size: Whites =*622; blacks = 622. 
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when the. two 'individual sa9e^ir<ienUcal twins,* 



Criteria of Cul tural Bias ' , 

^ First, we must clearly distinguish between two concepts: c ul ture 'loading 

and cultu4-e bias . Culture loaded does not mean the same as cult ure bias ed. 

Tests and test items can be ordered S&ong a continuum of culture loading, 
which is the specif icity or general ity- of t*he inf ormatioaal con5:ent of the 

test items. The narrower or less general the culture. i'n which the test^s^ 

information content could be acquired, the more culture loaded it is. A test 
*rT\ay contain^ information that could only be acquired , within a particular culture 
This can usually be determined simply by examination of the test items. The 
specificity or generality of the content corresponds to its cultural loading. 
The question "Name three parks i^ New York City" is, in this sense'^ more 
\cul ture-loaded thar> the question "How many 10c postage stamps can you buy 
for, $1?" ^ . ' * ' 

. Whether the particular cultural content tauses the test to be biased 

y i 

Vith respe€-t to the performance of any two ( or more) groups in the population 

* • 
is a separate issue. To the extent that the te*st contains cultural content 

that is generally peculiar to the members of one group^but not to the mem^Ders 

of another group, it is liable to be biased with respect .to odmpar isons of . 

the test scores between the groups or predictions based on th^ir scores. 

^Score differences 'per se, whether between Individuals, social classes, 

or racial groups, obviously cannot be a proper ariterion' of bias. There^ls 

no b^asis for assuming a priori that any two populations should be equal in 

whatever it is that the test is supposed* to measure. 
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\ 

^ Legitimate criteria of *test bias are of two general types: external ^ 

\ • * • . 

and internal, or predictive validity and construct validity. ^• 

For practical uses of tests, predictive validity is crucial. One cri- 
terion of test bias is if the intercepts and slopes* of the regression of 
• criterion measures on t.est scores differ appreciably for 'the two populations 
in question. In other words, the test scores do'^not predict equally well for 
both groups. The person's predicted performance on tho criterion--job, 
school, etc.--wiir'be influenced by his group membership ^nd not just his 
test score.^ An unbiased test, on the other hanji, is colorblind. It makes 
the same predictidn of your future performance based just on your test score 
and the prediction turns out^Just as accurately whether you aire white or bl'ack 

Reviews of the research ©n this point comparing white and black samples 
are unequivocal with respect to the prediction ^of scholastic and jgb perfor- 
mance by me gins, of standard tests. There is a negligible difference in the 
slopes and intercepts of regression lines for whiter and blacks. A- single 
regression equation predicts equally well -for both racial groups (Humphreys^ 
1973; ^Linn, 1973). Interestingly, the few exceptions reported in the li-tera- 
ture would favor the black groups if the tests were used for selection, i.e. ^ 
the difference in the regression lines is such that for any given" test score 

1 r ' 

whites slightly out-perf orm^l^acks 'on the criterion. In brief, the over-* 
whelming evidence on the predictive validity of 'standard tests indicates that 
they are not biased against blacks when compared wi tl^. whites. (There are too 
few studies of other ethnic groups to permit any general conclusions abc5ut 
them. ) ' ' * * , 

Construct Validity criteria of test bias are pore complicated, but no 
less important. It is very likely that tests which show little or no bias in 
terms of the indices of construct *val idity are also unbi.ased in predictive 
validity. « : V* 
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• • •> • ^ 

Construct validity criteria 6f bias refer to internal characteristics 

of the test and the degree of similarity of their statistical properties from^ 
one' group to another. Construct validity, irt the context^ of test bias, also 
involves the question of 'whether a test, or a battery of tests, measures 
'individual differences in the same r^vpothetical ability in both of the popula- 
tions in question. Does oup theory of what the test measures yield predictions 
that are empirically borne out in the one group as well as in the other? If 
there is a difference in group means on the test, does our theory of what the 
test' measures predict other previously unsuspected differences between the 
twp 4|fci:oups? 

"I shall illustrate the appl i cation of some of the cri-teria of internal 
or, construct bias on a Variety of well-known standard tests of'mental abilities, 
mainly in.|:el 1 igence or IQ tests. - In all the examples, the populations for 
which evidence .of test bias was sought by these criteria are whites' and blanks 
in the United States. We have more extensive test data on these two groups 
than on any others in our population, and controversy over test bias has 
revolved largely around the well-kriown white-black differenced in test scores. 



Tests at the E x tremes c5f Cult ure-Lo a ding 

4 

First, let us contrast, two tests that I' believe most psychologists will 
agree are widely separated on the culture-loading contini^um-T the Peabody 
Picture Vocabulary Test (PPVT) and Raven's Progressive Matrices. 

The PPVT consists of 150 plates, each'with four pictured. The examiner 
names one of the pictures and .the subject is asked to point to it. The voca- 
bulary ranges from very easy, common, and concrete words to very rare words 
and abstract concepts. The Progressive Matrices c<^nsists of 60 plates, each 
with a missing part which the subject must select from a multiple-choice set 



10 
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of six to correctly complete the pattern. Iteris range in complexly and 



difficulty from a level that is passable by mosyt thr-ee-year-olds up\t^ a 
level of difficulty beyond the capacity of the ;aVerage adult.' Figur'^^ 1 'shows 
typical PPVT and Raven lten)s of moderate difficUltV 



Insert Figure 1 about here 



* ( 

Both of these„tests were indivi dually ' administered to about 600* white 
and 400 black children, ages 6 to 12, in California schools. (Full details 
of this, study are\^gi vcn. by Jensen, 1974). The two groups show tbe typical 
IQ difference of about one standard deviation (15 points) on both*\tests. , 

Correlation of Raw Scores with Age > The first indicatioj/ thaJL the 

Peabody and Raven behave quite similarly in both racial grgups is Jthe *fact 

that the groups are about the same in the correlation between raw scores and 

age in months, ^ correlation of about 0.70, for both te^ts in bc^th racial 

m 

groups. If the tests were measuring something quite different in both groups, 
it seems unlikely that the scores would have nearly' the same correlation with 
age in each group. , ' ^ 

Internal Consistency Reliability . The internal consistency reliability 
coefficient in the Peabody is .96, both for whites and for blacks; the kaven 
reliabilities for whites and blacks are .90 and ;86. (The Raven has a lower 

reliability than' the Peabody only because the Raven consists of fewer items. 

1 

Corrected for length of test, the Raven's reliability is higher than the 
Peabody ' s. ) 

•V 
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• If one group *vere more carries s, than the other in taking the tests)', 
or- made more haphazard guesses at the answers, or otherwise co/itayiinated 
ttheit pBr f ormanpe'', we should expect quite different inteiNoaL consistency 
reliabilities. Buf we see that the rel iabil i ti6d are highly com^arabJLe ' 

'r ; ■'. * ' '. ' ■ " ^ " 

for -whites and blacks. . v • ^ - - . 

' ' ' * •>•-."» 

^ \ y . ^ • • 

Rarik^Order of Item Difficulty . The percentage £ of the 'group _pa^s ing 

; • ' ^ • ' . . • . • -. ■ — ' ^ 

'an item is an index of- item di^^iculty. 'We can compare .the rank oi;der .of 
4^ these values in the white, and black groups and express the degrefe of sifni- 
lari'ty between the groups i^y means of the correlation between the P values. ♦ 
(All the correlations a're .corrected for attenuati on ,^us ing the ccTrrelation 
^of each racial group with i'tself, i.e., the reliability of the rank order of Ps 
wi'thTFi' e^ph racial group.) ^ . 

On the Pcabody test, the cforrelatrOn between rank order qf item diffi- 
culty for blacks and- whites is .987. The correlatipn between blacjk males 
and black 'females is .983. In dther 'words-, , ttid'^ank order of item difficulties 
on th*c Ptabocfy is 'no^?^as 'different between whi*te,s and blacks as be tweon -black ^ 
males and black females. (The correlation be tween vhi te males and females 

is .98&. ) ^ ' ^ * 

t , . - 

V ^ • 

The cros^-racialrc9^trelations of item difficulties in the Raven are all 
We can safely conclude that for the Peabody and^^ the -Raven , the rank order 



.9^ or greater v^en^.et5rrectJgd ^^r attenuation. 



of ijigm difficulty is the same fof whites and blacks. 

This was found* not -to be the" case when Peabody ttsts were obtained on 

• ' • • • • , * 

White 'school childq[:en in London, England, as compared with age-matched white 
children in »Cal if'^^v^^. Quite a number of Items differed markedly in r^k j ^ 
.order, of difficulty, and some were as many as 50 items apart in rank order^ 
for Londoners and Cal if ornlans. ' 'Obviously the ^1 inguis t^ic backgrounds of 
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Londoners and .Calif ornians differ very much mc^re than*of whites and blocks 

residing in California. The English children', however, also found certain 

Vords rnUch easier, while some were more diffrcult, so \hat the ov^raLl . • 

• ... ' / N 

dif f^rences ' average out and both the English and tl>e Cal i fornya white children 

■- / % • , ' 

obtain aoout the same mean IQ* California bl acks , •however , /have a lower 

percent^ passing on every item in the test^ but the rank o^der of item^diffi- 

' ■ as - ■ ' . . '/ 

culty for the blacks is the samot for whites. - ^ 

If the Peabody Picture Vpcabulary Test were really reflecting. a cultural 

background difference between whites and bladks, we should expect to see tho 

kind- of differences in rank order of difficulty t^^at .we see between Londoners 

and Cal if ornians. 'But we find no^^i f f qrcnce between blacks and whites in 

'the raok^order of item difficulties* .•• 

C^orrela ti on of P Decrements . Let's rerrjbve, the level of item difficulty- 

*• - > ^ 

altogether and look at only the- difference s between item difficulties for 

^adjacent items^ in" the t^est. This is "-^2 ' ' — 2 3 ' where P^^ is the 

percelit passing item 1 / is the , percent passing item 2, and so on. This 

\s a most sensitive' index of group similarity^ On this index, called the 



decrement, the equivalent forms A and Byof the Peabody test are coi;related 
zero in the very same group of persons, even tliough the correlation of item* 
difficulties for Forms A an^ B,ia the same group ^is- *97. 

The correlation (corrected for attenuatioA) between whites'* and blacks' 
decrements on adjacent 'items is .830. The correlation between ^ decrements 
of • males and females is .823 vn^whij^es and .880 in blacks. Thus, we see again 
that the two ra^es differ no moi'e than do the two aexes, of the same race. 
The -Raven's P decrements^ in whites and^ blacks correlate .980. , ' " 



If the items of thes^ tests were culturally bias^^d* for blacks, 

'Ms . • 



it would 



be rerfiarkabje incjeed that their rank^ order of difficul.ty and the differences 
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in difficulty between adjacent items should be virtually the same in both 
the bl^ck and white groups; It would seem more remarkable, that two tests 

* • : . * ' 

as dissimilar in cul yir^e- loading ant! Information oont.ent as the Peabody 

, » ^ r 

/ . • \ ^ ^- • 

and the^'Raven should both show such high degr<?es 6i similarity between 

blacks and whites in the rank order of ^ values and P decrements. 

Matching Peabody and Rav<-: Item s, Are verbal tests more biased than 

nonverbal? The smal T^i f f erenc • i ,bet;,^een the P abody and Raven that we have 

seen in the preceding ^analyses ^how very lit<ile difference between the tests 

on the two Indices of bias we have examined^ , 

Going d step further, we perfectly matched P abody and. Raven items for 
. r • • . 

difficulty in the white group. For each of 35 Raven items we found a Pcabod^^. 

item with exactly the same percent passing. If t,he culture-loaded Peabody 

• ■ ' '■ / • ^ ' 

wa^more biased against blacks than the ^ven', thence should expect blacks 

to obtain lower scores on the Peabody than on the Raven, when the^ dif f icul- 

ties of the two teS^s are perfectly matched in the white group. It turned 

out that blacks showed no significant difference between Raven and Peabody 

*^ ^' • , ' » . 

scores. Raven and. Peabody items' matched for difficulty nn the white group, 

/ ' ' 

it turn'&~"^ut, are thereby also matched for difficulty in the Ijlack. group. 

' ' ^ - ; ^ ^ ^ \ 

We tried the* same analysis on* a Mexifan-Ameri»can grqup. But it showed 

■ * t ' 

a highly significant difference in favor of the Raven. Thus there is some 
evidence that a vocabulary test in English may be a biased test of intelli- 
gence for Mexican-Americans* 

• / 

For reasons I need not go into here, I. don't think the Peabbd^J is an 
especially gobd measure' of general intelligence for ei titer whites ^or-^bTacks. 
6ut I find no evidence that it, is. biased with resj)ect to either of pnese 
gr6ups., " )' ■. 



/ . 
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Item Discriminabilities Within and^ Between Racial Groups 



In 'Both the Peabody and the Raven v*e compared (a) ^he correlations 



between single .items and total* score within each racial group , and ih) 
the point-biserial correlations between' single items and the racial dicho- 
tomy. The firs't set of correlations, 4, tells us how well each item measures 

• ! • 

whatever the test as a whole is measuring and how well the item discriminate^ 
among persons within a given racial group. The second set of correlations, 
^, tells us how much the items discriirvinate between the two racial groups. 
It turns out that the items that best meHsuire individual differences within 
each racial group arfe the very same items that discriminate the most between 
the racial groups. Th^se items have fhe highest correlations with total 
score for both blacks and whites. 

Analys'fs of Wrong Answers 

Culture bias leads to the expectation that whij:es and blacks should 

» If*' 
make different errors among the mul ttple-choice distractors of the items 

they get wrong. But analysis of incotrect responses (errors) in the 

Peabody shows that the errors are distributed in a non-chance fashion over 



the multiple-choice distractors ^or 



each item In the same proportions for 



whites ^d t)lacks. There were several significant exceptions to this finding, 

" I 
in Raven's Matrices:' -on some items! blacks made different errors than, whites. 

But in ^very such instance it was round that the black children's proportions 

. of responses to the various error iis tractors v\ere the same as the propor- 
« - \ ^ 

tibns for white children who were approximatel>y two *years younger in chrono- 
logical agQ. Thus it appears that) the few differences that were found between 
white and black children are more Iclearly related to differences in level of 
mental maturity' than to cultural differences. 
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Simulation of White-Black Di^fferences 

An overall analysis, of variance was performed on the fbllowing factors 
and all their interactions,' for both th§ Peabody Picture Vocabulary and 
Raven *s Matrices: Race , Sex , Age , Item^ , and Subjects * 

The intJeraction of greatest interest in terms of detecting culture bias 

X ' f ' ' ' ' 

is the Race |x Items interaction. The size' of the Race X Items interaction, 

' . ■ \ * - 

relative to other sources of var^iance, is a sensitive index of bia'^. It turns 
out that the interaction, though statistically, significant*, accounts for 
less than 1 percent of the*^ total variance .in both the Peabody and Jthe Raven. 

We found that we could perfectly simul ate , within the margin of sampling 
error, tfiis whole analysis of variance, v/ith all its main effects and all- 
their interaction's, using only the white sample . We called this comparison 
of two different age groups of whites a Pseudo-race comparison. 

We divided the en-tire white sample into two 'groups: a younger group 
(ages 6 to 9), and a slightly overlapping elder group (ages 8 to 1*})* The 
same analysis of variance that was p*erf or]3J>ed on Ijlacks and whites when per*- 
formed on these two different age groups of whites reproduced all of the 
features df the analysis of variance on the two racial groups^^ There is jus^t 
no difference between the two sets of variances, within the margin of sampling 
error. This is true fbr both the Peabody and the Raven. The Pseudo-race ' 
X Items interaction was also about 1 percent of the variance^ . ^ 

Finally, by doing the same an^J^ysis again on , the two races, but this, 
time using whites of ages 6 t^9 and blacks of age^^8 to. 1*1, we found that' 
the Race X Items interacticp became qui te. nonsignificant (less than 0.2 

percent of the total variance). « * 

• ^ \ • % 

Further analyses in th]^ vein failed to reveal any features of theiPeabody 

or Raven performance which wil r>statistically distinguish blacks from whites} 
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who are about two years younger, or which show any differences between blacks ^ 
and whites (of 'the same age) that do not show up also betweeji groups of 
younger and older whites • 

In the light of these findings, for anyone to maintain that these tests 
are culturally biased with respect to black-white comparisons, he would have*' 
to argue that* the cultural ^differences between California blacks and whites 
perfectly simulate age dif f erence3 within the white group, for such a diversity 

4 

of^indices as rank order of item difficulties, ^ decrements, interitem corre- 

'ft : 

lations ,• choice of distractors, and it^/ factor-loadings on the first principal 

cbmponent--on tests as diverse as picture vocabulary ahd progressive matrices! 

\ 

Obviously ;Bt^h an argument is, grossly implausible, 

A variety of othai? tests .^^lave .shown the same^ sort of thin^;, that is, 

black-white differences in test performance can be perfectly simulated, 

^- • ^ • 

quantitatively and qualitatively, by comparing groups of younger and older 
^> > # • 

I • . ♦ * 

white children. This has been shown for Piagetian conservation tests', copying 

/■• • . • " . . 

simple geometric de§4.gns, and deveJ-opmental *'tes ts involving free-choice 
pteferei^es for m^^^ing stimuli on the b^is of color, form, sj!ze,' and 
number (Jens^cn, 1975)4^ ' 

Indices of Internal Bias Applied tp Other Tests 

^ - . ^ " V \ 

' The types' analysis described above'have been apfilied to other tests 

as well, all with highly similar results. But cerJ^jLn outstanding points ' 

aire worth mentioning, ' . ' * 

\;' ' 

S tanf ord-Binet , The rank order of difficulty correlated between racial 

. ^ ^' . , . \ ' 

or cultural groups gains greater cogency when the test items* are more hetero- 
geneous, since it is so unlikely that a cultural dif ference, between two groups 
would result in the same rank order of difficulty an the t^o groups over a set 
of items that differ markedly in their specific demands on knowledge and skills. 
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There is probably no more heterogeneous collection of intelligence test 
itemfiT to be found anywhere than the Stanf ord-Binet items included in the teffts 
for ages 3-1/2 to»5. The items involve size comparisons, simple picture 
puzzles discrimination of animal pictures, sorting colored buttons, verbal 
comprehension, picture vocabulary, opposite'^analogies , aesthetic comparisons, 
following directions, and so' on. 

• In a doctoral thesis^, Paul Nichols (1972) analyzed 16 items of the 
Stanf ord-Binet from year III-6 through lV^-6--the most heterogeneous , sequence 
of items i-Ji the whole test — given to 2,514 black and 2,526 white children, 
all between 4 and 5 years of age. ^ 

Note three^ important p^in^ : we are dealing with only a restricted 
portion, of the St^thf ord-Binet test (16 items from year III-6 through IV-6), 
all. the children ate within a one-year age interval , and all are pr'eschoolers-- 
they haven^t yet been exposed to the common culture* of public schooling. 

The correlation 'between the blacks and whites in the percent passing each of 
these 16 Stanf ord-Bihet items turns out to be .96. That's .96, without cor- 
'rection for attenuation. - ' ' 



^ ''The _P decrements correlate acr9-ss races ,,50, whicji indicates considerable - 
racial sim^arity even in the differences in difficulty between adjacent items. 

Thus, in this age range, at least, the Stanf ord-Binet IQ test doesn't 
look at all culture biased, I .j^ould be qui^ surprised if black-white compari- 
sons turned out very differently from this for any other sect^ion of the Stanford- 
Binet for any other 'age range. 

• It can also be noted -that those items that critics most often single 
out as examples of racially biased items either have the same rank order of 
difficulty for blacks as for whites or are relatively easier items for the 

blacks, which is just the opposite of the popular claims of culture bias. 

' ' \ ■ 

against blacks. , * 

19 



Test Bias 16 

. • / 

' Wechsler Intelligence Scale for Children . The WISC .provides some 
striking examples of how invalid are the critics' subjective armchair analyses 
of cultural bias in specific test items. For example, a favorite tatrget 
of test critics is the WISC Verbal Comprehension item: "What is "the, thing 
to do if a fellow (girl) much smaller than yourself starts to fight with 
you?" This item is often claimed to be culturally biased against blacks,' 
and even Dr. David Wechsler himself conceded to this claim in an interview 
with Dan Rather on the recent CBS-TV program -"The IQ Myth." 

After seeing the CBS "Myth" program, a psychology graduate student, 
Frank Miele, had the innovative idea of looking up the item statistics on 
this and other WISC items. He obtained WISC tests on large samples of age- 
matched white and black school children in Georgia and looked at the tank, • 

order of diff iculty of this purportedly biased item within each racial g3C*oup. 

if' V 

When the easiest item io the whole WISC is raniced 1 and the hardest* is r^ked 
1.61, the rank order in difficulty of the "pick a fight" item is only 42^^' 
within the black group, as compared to 47 within the white group. .'In sh^rt, 
this particular item is relatively eaiser for blacks than for whites] ^e . 
armchair claims of bias are thus easily debunked by just looking at tfife item 

L ' ■ 

statistics. » ' ' ' . .\ 

. ; - - ^ 'I 

The cross-racial correlation for rank order of difficulty over allVl61 

of the WISC items is .95. The correlation across the sex^s within each^racial 

group is .97. The correlation of difficulty rank in whites with that in ' 

blacks 'who average two .years older is .96. Note that the WISC items, muchs\,-^ 

like the S tanf ord-Qinet items, are also very heterogeneous. Yet the rank 

order of diffi*culty of WISC items is not significantly different for whites 

and blacks. • 
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Wonderlic Personnel Test Thi r is a widely used general intelligence 

test for adults, made up of 50 very heterogeneous items — verbal, nonverbal, 
-spatial, numerical, logical, and so oti. We have found that the correlation 
in percent pas^ng the 50*item$, between s^ples of more than 700 blacks and 
700 whites,' is .94. The ^ decrements correlate .81. 

We also tried to find out- if 5 bl^ck and 5 white psychologists could ^ 
sort out the 8 most and the 8. least racially discriminating Items when all ' 
16 items were presented on, separate cards randomly shuffled. The judges ' 
sorted no better than chance. Again, armchair inspection of items is shown 
to be a very poor clue as to which it^ms will discriminate the most or the 
least between blacks and whites. 

On the other hand, we found that if you factor analyze all the it^m ^ 
intercorrel ations within each racial group^^the item*s loading on the general 
factor (or first principal component) correlates substantially with the item's 
racial diScriminability, and this is true within both racial ^groups. In other 
words, the more highly a test item is correlated with the most general 

factor common to all the items, within either racial group , the more highly 
does the item discriminate b^.tw'een the racial groups. , 



Ts g the Same g in Blacks and Whites? 


The general intelligence f.actor or ^ can be defined as the first principal 
component — the lar^gest single source oi .individual dif f erences--in a hetero- 
geneous collection of cognitive tests. An important criterion of the construct 
v,alidity of any test (or test ivtem) as a measure of intellig^hce is Its 
loading on when it is factor analyzed among a battery of other tests, pre- ^ 
ferably tests that are heterogeneous in informational content kad in the types 
of co^itlve processes involved -in arriving at the correct ^^swers. 
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How similar is this general farctdr for blacks and whites given the same 
" battery of cognitive tests? 

Frank Miele and R. T Osborne (personal communication) have sent me 
, correlational data on 541 white and 237 black children* in Georgia schools. 
All the children were given 29 cognitive tests of the greatest variety — . 
v*erbal , .numerical, spatial,' nonverbal reasoning, form board, vocabulary, 
arithmetic, spelling — you name it. The tests were borrowed from several 
different standard batteries. 



A principal components analysis was done, separately in the white and - -^sii^^ 

black samples. Also, each racial group was randomly .split in half and a 

principal components analysis was'^done in each of the split-half subgroups. 

In this way we can det-ermine the reliability of the first principal compo- 

♦ 

nent or g factor wi.thin each xacial group. 

^ 'i "'^ 

The final step was to determine the cojfrelation between the ^ factor 
loadings, one set based on blacks and 'one set based on whites, over the 29 
tests^. This correlation turned out to be .68. Corrected ft)r unreliability, 
using the within-race s'plit-half correlations in the usual correction-f or- 
attenuation- 'formula, the corrected correlation becOTies .97. This high corre- 
lation constitutes very strong evidence that the ^ factor in this large 
battery of tests is the same ^ for blacks as for ig^hites. 

Nichols (1972) intercorrel Ated 7 of the subtests of the Wechsler 
Intelligence Scale for Children (WISC) combined with the Bendfer-Ges talt 
Test, the Draw-a-Man Test', the Illinois Test of Psycholinguis tic Abilities, 
and tests of reading, spelling, and arithmetic achievement — 13 tests in all. 
This test battery was pactor-analyzed separately in a group of 986 whites , 
and 975 blacks, all 7 years of age, drawn from, Boston, Philadelphia, and 
Baltimore. The ^ loadings of the 13 tests correlate .98 across the races. 
(Tliat's T98 without correction for attenuation.) 
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1 have done the same cross-racial' correlation of loadings on a battery 
of 14 diver'ge cpgnitive .and achievement tests .in large samples of blacks 
and whites in Grades 5 through 8. The cros^s-racial correlations of ^ loading 
are of about the same magnitude as the ^correlation of eacK racial group with 
itself from one school »grad^ to the next. Corrected for attenuation, the 
, cross-racial ^ correlations fluctuate ' around , unity. ^ 

have not found any evide^ice based on substantial or representative 
groups of blacks and whites that the ^ factor me as'ured by our standard tests 
is in the least a different ^ in' blacks than in whites. 

If thg^^jj^ts were cultqrally biased for these two populations, we would 
hardly expect, the magnitude of ^ the bias to be so uniform over all types of 
items and testsr that they' would all have the same ^ loadings (within the 
margin of sampling error) in black and white populations. 



What is the Nature of ' ,r 

What is this ^ factor tha1> practically all cognitive tests have in common 
despite the great diversity of their content and the seemingly different 
mental processes they call upon? No one^ really knows yet what makes 

for certainly nat in any basic physiological sense. But we do have some 
idea as to its psychological nature. * - ^ / 

By inspecting the ^ loading^ of dozens of tests and many hundreds of ' 
individual items, I am led to the conclusion that tKe key word regarding ^ 
Is complQ^city — complexity of thp mental operations required by a test item 
in order for the person to produce the correct answer. Not difficulty per 
se, but complexity is .the k^y to .Items that require some active mental 
manipulation, some conscious mental transformation of the input, rather than 
just sensorimotor and short-term memory ability or a habitual response, are 
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the irost j^-loaded items. The more mental manipulation and transformation 
an item involves, the more it is ^-rlpaded. This is true for blacks and ^ 
whites alike. I daresay it's true for all humans, and perhaps even for 
all animals that possess a cerebral cortex. 

If we hypothesize that the well-established average IQ difference of 
about 15 points between blacks and whites is mainly a difference in ^, in 
the sense of a capacity for dealing with cognitive complexity in any form, 
rather than as just a difference due to specific cultural content in tlie IQ."^* 
test, then we*should predict that blacks and whites will differ less in per- 
formance on tasks involving lesser cognitive complexity than on tasks invol- 
ving greater cognitive complexity. .What do we find? 

- "i, . ' * 

' Re.action Time Studies . One experimental test of this cOTiplexity hypo- 

thesis iV based on differences in simple and choice reaction time to visual ' 
and auditory stimuli. In all persons, reaction time (rT) iruBj;:eases as a 
function of stimulus complexity, i.e., the number o^ bits of information in 
the signal to which the person responds. ^ It has also been shown that there 
is no correlation between simple RT and IQ, but there is a negative correla- . 
tion between IQ and choice RT. That is, persons with higher IQs show quicker 
RT in .a choice situation. 

Four independent experiments using quite different methods but comparing 
simple aqd choice RTs in whites and blacks all shbw no significant race dif- 
ference for simple RT. But they all show a significant race (or race confounded 
with SES) difference for choice of complex RT (Bosco, 1970, Jensen, 1975, 
Noble, 1969; Poortinga, 1972). In these experiments, each person acts as 
his own control. It 1^ the difference between simple and choice RT that is 
of primary interest, not their absolute values. Blacks, on the average, show 
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— - • 

a larger difference between simple and choice RT than do whites. RT, 
incidentally,' is measured independently of total movement time, which is 
only slightly C03;^elated with RT^and is unrelated to complexity. It should 
be remembered thkt a 2-choice, 4-choicev or 8~choice RT task is still a 
very low level of complexity as compared with most IQ test items, but it is 
st^ill more complex than the ' practically zero complexity of pimple RT. 

I > • V , 

Forward and Backward Digit Span Memory . If ^ reflects, tapac-ity for 
, ' ' \ ^ 

mental manipulation and transformation, and if it is the ^ factor on which 

blacks and whites essentially differ,, then we should expect a largej;^ racial 
difference on those tests requiring more mental manipulation and transfor- 
mation of the input in order to arrive at the output. 

The fbnJatd and backward digit span tests of the Wechsler (WISC) lend 
themselves nicely to a test of this hypothesis. For one thing, most clinical 
psychologists judge the digit span test to be one of the least culture-loaded 
subtests in the Wechsler battery. Moreover, digit span shows the smallest 
average white-black difference of any of the subtes|§r4 ^ 

Everyone, I think, would agree that backward digit sp^ — repeating a 
series of numbers in reverse order — calls for somewhat more meQt^l manipula^ 
tion and transformation than does forward digit span. 

This being so," our theory of ^ should j>redict the following: 
1. Backward digit span should correlate more highly with totaL IQ than should 

forward digit span. 
2^ BJLacks and whites should differ more on backward than on forward digit 

span. 

We tested these predictions in age-matched samples of 622 blacks and 
622 whites randomly^ drawn from California schools^^ (Jensen 6c Figueroa, in press). 
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^6th predictions are fully borne out by the data. We ^®und that backward 
sp^ correlates significantly higher with total IQ than does forward span; 
^ and this is true within each racial group. We also found /that the, differerte^ 
between whites and blacks in backward^ memory span is more than twice^as 
large as the difference in forward memory span. When we pontrol f.or *socio~ 
economic status, there is no significant race difference in forward memory 
span, but thg race difference remains substantial in backward memory span. 

Figur^ 2 shows the total WISC IQs as ;a function of race and Duncan's 
index of socioeconomic status. , ' 

J* 



Insert Figure 2 here 



\ 

Figure 3 shows forward- and backward digit span scores as a' function of 
race and SES. (The interaction of race X forwai;d vs. backward span is signi- 
ficant beyond the .001 level.) , . , 



Insert Figure 3 here 

Thus, the theory of ^ as a capacity for dealing with complexity and 
the conscious transformation of input has predicted two previously unknown 
phenomena: (1) the differential correlation, of forwaftrd and backward digtt 
span with IQ, and (2) the significantly smaller racial difference in forward ^ 
than in backward digit span. I don't know of any hypothesis invoking cultural 
bias in the Wechsler tests that would have predicted either of these inter- 
esting psychological phenomena. 
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Fig. 2. WISC-R Full Scale IQ of Black (N = 622) and White ;(N' = 622) 
samples as a function of socioeconomic3>8^€^^s as measured 
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" , Conclusion 

The several statistical jnethods I have described for detecting test .bias 
in terms of various internal 'f features of persons' test performances and the 
test's 'construct validity can of course be ^pptied to any other groups in the 
population. ^.But the evidence regarding groups other than U.S. blacks and 
whites is either lacking ^or is still too sketchy to permit any strong con- 
elusions. 

The evidence regarding black-white comparisons, however, iS' based on a ^ 
number of well-known, widely used, and quite diverse standardized individual 
and group tests of i^ntelligence given to large representative samples of , 
whites and blacks. 

^ The resul-ts are unequivocal: none of the several objective indices of 
cultural bias ^hows any significant indication of bias 'in any of these tests 
jwhen they are used with blacks and whites. Correlation of "raw scares with 
age, internal consistency reliability, rank order' of item, difficulty, (i.e., 
percent passing), relative difficulty of adjacent items, item .cofls^lation 
with total score, loadings of items^or tests on the general factor, and rela- 
tive frequencies in choice of error distractors — ^11 are substantially the 
same in the white and bla'ck groups. 

I conclude that these standardized teats' of intelligence — the Peabody 
Picture, Vocabulary, Raven's Prog/es-s'ive Matrices, Stanford-BineJL, Wechslejj^ 
Intelligence Scale for Children, Wonderlic Personnel Test, and most likely 
many other similar tests— are not' at .all culturally biased for blacks and ^ 
whites. They behave statistically the same in both racial groups and do 

i 

essentially the same job in both groups. 
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Claims based on subjective armchair sunaise and speculation about 
cultural b).ases in speciific test items--the sole method of those critics - 
of tests ,who wislv to fos-ter the myth of culture bias--are proven false by 



the objecti-ve evidence. Moreover, thje fact that it may be possibly to 
specially devise culturally biased items in no way prove^^h^t all of our 
existing standard tests are cult^^rally biased. Culturally loaded — of course. 
But not culturally biased. The distinction is crucial. The myth of culture 
bias thrives on obscuring this distinction. 

The large general factor f^ieasured by our standard tests of intelligence 
IS clearly the same factor in blacks as in whites. The hypothesis that this 
general factor is a capacity for cognitive complexity, conscious mental' 
manipulation and transformation of stimulus inputs, has led to predictions 
.that are borne out empirically at a high level 'Of significance. 

Neither science nor the cause of social justice is served by denying 
these findings. As researchers our response is to question, analytically \ 
criticize, replicate results, determine their limits as to other mental tests 
and populations, seek the causes of test score variance, pit alternative 

' theories against . one another"-and openly renounce those hypotheses that objec 

tive evidence repeatedly disproves. 
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Footnote 
r 

1 

I^ am indebted to Jane R. Mercer f or^ the WISC-R data and -the SES 
ratings. They have been 'descrjLbed in detail in Jensen 6i Figueroa (in press). 



